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We explore the limits of the autoregressive (AR) sieve bootstrap, 
and show that its applicability extends well beyond the realm of linear 
time series as has been previously thought. In particular, for appro- 
priate statistics, the AR-sieve bootstrap is valid for stationary pro- 
cesses possessing a general Wold-type autoregressive representation 
with respect to a white noise; in essence, this includes all stationary, 
purely nondeterministic processes, whose spectral density is every- 
where positive. Our main theorem provides a simple and effective 
tool in assessing whether the AR-sieve bootstrap is asymptotically 
valid in any given situation. In effect, the large-sample distribution 
of the statistic in question must only depend on the first and sec- 
ond order moments of the process; prominent examples include the 
sample mean and the spectral density. As a counterexample, we show 
how the AR-sieve bootstrap is not always valid for the sample auto- 
covariance even when the underlying process is linear. 

1. Introduction. Due to the different possible dependence structures that 
may occur in time series analysis, several bootstrap procedures have been 
proposed to infer properties of a statistic of interest. Validity of the different 
bootstrap procedures depends on the probabilistic structure of the under- 
lying stochastic process X = (Xt -t G Z) and/or on the particular statistic 
considered. Bootstrap schemes for time series rank from those imposing more 
parametric type assumptions on the underlying stochastic process class to 
those accounting only for some kind of mixing or weak dependence assump- 
tions. For an overview see Buhlmann (2002), Lahiri (2003), Politis (2003) 
and Paparoditis and Politis (2009). 
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A common assumption is that X is a linear time series, that is, that 

oo 

(1.1) x t = b i^ v tez, 

j=-oo 

with respect to independent, identically distributed (i.i.d.) random variables 
(et) — often assumed to have mean zero and finite fourth order moments — 
and for absolutely summable coefficients (bj); this is not to be confused with 
the Wold representation with respect to white noise, that is, uncorrelated, 
errors that all stationary, purely nondeterministic processes possess. If bj = 
for all j < 0, then the linear process is called causal. 

Stationary autoregressive (AR) processes of order p are members of the 
linear class (1.1) provided the autoregression is defined on the basis of i.i.d. 
errors. Model-based bootstrapping in the AR(p) case was among the first 
bootstrap proposals for time series; see, for example, Freedman (1984). The 
extension to the AR(oo) case was inevitable; this refers to the situation 
where the strictly stationary process Xt has the following linear infinite 
order autoregressive representation 

oo 

(1.2) X t = J2*jXt-j+eu teZ, 

i=i 

with respect to i.i.d. errors et having mean zero, variance < E(e$) = a\ and 
E(ef) < oo; here the coefficients TTj are assumed absolutely summable and 
tt(z) = 1 — Yl'jLi 7^ f° r \ z \ = 1- The two representations, (1.1) and (1.2) 
are related; in fact, the class (1.2) is a subset of the linear class (1.1). Fur- 
thermore, it can be shown that the linear AR(oo) process (1.2) is causal if 
and only if n(z) = 1 — Yl'jLi f° r M < 1- 

There is already a large body of literature dealing with applications and 
properties of the AR-sieve bootstrap. Kreiss (1988, 1992) established valid- 
ity of this bootstrap scheme for different statistics including autocovariances 
and autocorrelations. Paparoditis and Streitberg (1992) established asymp- 
totic validity of the AR-sieve bootstrap to infer properties of high order 
autocorrelations, and Paparoditis (1996) established its validity in a multi- 
variate time series context. The aforementioned results required an exponen- 
tial decay of the AR coefficients TTj as j — > oo; Biihlmann (1997) extended 
the class of AR(oo) processes for which the AR-sieve bootstrap works by al- 
lowing a polynomially decay of the TTj coefficients. Furthermore, Bickel and 
Biihlmann (1999) introduced a mixing concept appropriate for investigating 
properties of the AR-sieve bootstrap which is related to the weak depen- 
dence concept of Doukhan and Louhichi (1999), while Choi and Hall (2000) 
focused on properties of the AR-sieve bootstrap-based confidence intervals. 

A basic assumption in the current literature of the AR-sieve bootstrap 
is that X is a linear AR(oo) process, that is, Xt is generated by (1.2) with 
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(et) being an i.i.d. process. One exception is the case of the sample mean 
X n = n~ l Ylt=i Xti where Biihlmann (1997) proved validity of the AR-sieve 
bootstrap also for the case where the assumption of i.i.d. errors in (1.2) 
can be relaxed to that of martingale differences, that is, E(et\£t-i) = and 
E(e^\£t-\) = <7g with £ t ~\ = cr({e s : s < t — 1}) the a-algebra generated by 
the random variables {et—\, &t-2i ■ ■ •}• Notice that the process (1.2) with in- 
novations forming a martingale difference sequence is in some sense not "very 
far" from the linear process (1.2) with i.i.d. errors. In fact, some authors call 
the set-up of model (1.1) with martingale difference errors "weak linearity," 
and the same would hold regarding (1.2); see, for example, Kokoszka and 
Politis (2011). 

To elaborate, for a causal linear process the general L2-optimal predic- 
tor of X t+ k based on its past Xt, Xt-i, ■ ■ ■ , namely the conditional expec- 
tation E(X t+ k\X s , s < t) of X t+ k, is identical to the best linear predic- 
tor VM t (Xt+k)', here k is assumed positive, and Vc denotes orthogonal 
projection onto the set C and Ai s = span{Xj : j < s}, that is, the closed 
linear span generated by the random variables {Xj :j < s}. The property of 
linearity of the optimal predictor is shared by causal processes that are only 
weakly linear. Recently, under the assumption of weak linearity with (1.2), 
Poskitt (2008) claimed validity of the AR-sieve bootstrap for a much wider 
class of statistics that are defined as smooth functions of means. However, 
this claim does not seem to be correct in general. In particular, our Exam- 
ple 3.2 of Section 3 contradicts Theorem 2 of Poskitt (2008); see Remark 3.2 
in what follows. 

The aim of the present paper is to explore the limits of the AR-sieve boot- 
strap, and to give a definitive answer to the question concerning for which 
classes of statistics, and for which dependence structures, is the AR-sieve 
bootstrap asymptotically valid. Moreover, we also address the question what 
the AR-sieve bootstrap really does when it is applied to data stemming from 
a stationary process not fulfilling strict regularity assumptions such as lin- 
earity or weak linearity. In order to do this, we examine in detail in Section 2 
processes possessing a so-called general autoregressive representation with 
respect to white noise errors; these form a much wider class of processes 
than the linear AR(oo) class described by (1.2). 

Our theoretical results in Section 3 provide an effective and simple tool for 
gauging consistency of the AR-sieve bootstrap. They imply that for certain 
classes of statistics the range of the validity of the AR-sieve bootstrap goes 
far beyond that of the linear class (1.1). On the other hand, for other classes 
of statistics, like for instance autocorrelations, validity of the AR-sieve boot- 
strap is restricted to the linear process class (1.1), while for statistics like 
autocovariances, the AR-sieve bootstrap is only valid for the linear AR(oo) 
class (1.2). But even in the case of the linear autoregression (1.2) with infinite 
order, the theory developed in this paper provides a further generalization 
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of existing results since it establishes validity of this bootstrap procedure 
under weaker assumptions on the summability of the coefficients ttj, thus 
relaxing previous assumptions referring to exponential or polynomial decay 
of these coefficients. 

The remaining of the paper is organized as follows. Section 2 develops 
the background concerning the Wold-type infinite order AR representation 
that is required to study the AR-sieve bootstrap, and states the necessary 
assumptions to be imposed on the underlying process class and on the pa- 
rameters of the bootstrap procedure. Section 3 presents our main result and 
discusses its implications by means of several examples. Proofs and technical 
details are deferred to the Appendix. 

2. The AR-sieve bootstrap and general autoregressive representation. 

Here, and throughout the paper, we assume that we have observations X±, . . . , 
X n stemming from a strictly stationary process X. Let T n = T n (X\, . . . , X n ) 
be an estimator of some unknown parameter 6 of the underlying stochastic 
process X. Suppose that for some appropriately increasing sequence of real 
numbers {c n : n G N} the distribution C n = C(c n (T n — 6)) has a nondegener- 
ated limit. The AR -sieve bootstrap proposal to estimate the distribution C n 
goes as follows: 

Step 1: Select an order p = p(n) £ N, p <C n, and fit a pth order autoregres- 
sive model to X\, X2, ■ ■ ■ ,X n . Denote by a(p) = (aj(p),j = 1,2, ... ,p), the 

Yule- Walker autoregressive parameter estimators, that is, a(jp) = Y{jp)~ l ^j p 
where for < h < p, 

n-\h\ 

j x (h)=n~ 1 (X t -Xn)(X t+ \ h \ -X n ), 
t=l 

X n = n- 1 Y^=i x t, r (p) = (7x(r-s)) r , s =i >2 ,.., p and % = (j x (l),lx(2), ■ ■ ■ , 
7x (?»))'• 

Step 2: Let e t {p) = X t - Y%=i®j(p)Xt-j, t = p + l,p + 2, . . . ,n, be the 
residuals of the autoregressive fit and denote by F n the empirical distri- 
bution function of the centered residuals % (p) = &t (p) ~ where e = (n — 
p)^ 1 X^™=p+i £t(p)- Let (X%, X%, . . . , X*) be a set of observations from the 
time series X* = :t£Z} where X 4 * = Tn=i^j(p) x t-j + e t and the e V s 
are independent random variables having identical distribution F n . 

Step 3: Let T* = T n (Xl, X%, ■ ■ ■ ,X*) be the same estimator as T n based 
on the pseudo-time series X^ , A| , . . . , X* and 9* the analogue of 6 associated 
with the bootstrap process X*. The AR-sieve bootstrap approximation of C n 
is then given by £* = C*(c n (T* -6*)). 

In the above (and in what follows), C* , E* , . . . will denote probability law, 
expectation, etc. in the bootstrap world (conditional on the data X\, . . . , X n ). 
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Note that the use of Yule-Walker estimators in Step 1 is essential and 
guarantees — among other things — that the complex polynomial A p (z) = 1 — 
Yl*j=i®j(p) z ^ h as n0 roots on or within the unit disc {z G C: \z\ < 1}, see 
the discussion before (2.22), that is, the bootstrap process X* always is 
a stationary and causal autoregressive process. 

The question considered in this paper is when can the bootstrap distribu- 
tion £* correctly approximate the distribution C n of interest, and moreover 
what the AR-sieve bootstrap does if the latter is not the case. To this end, 
let us first discuss a general autoregressive representation of stationary pro- 
cesses. 

Recall that by the well-known Wold representation, every purely non- 
deterministic, stationary and zero-mean stochastic process X = {Xt :t G Z} 
can be expressed as 

oo 

(2.1) X t = y^bjU t -j +ut, 

3=1 

where Yl'jLi bj < oo and ut = X t — VMt-i(Xt) is a zero mean, white noise 
"innovation" process with finite variance < a\ = E(v$) < oo; recall that 
A4 S = span{Xj : j < s}. 

Less known is that for all purely nondeterministic, stationary and zero- 
mean time series unique autoregressive coefficients (a& :i£N) exist that 
only depend on the autocovariance function of the time series (Xt), such 
that for any n G N, 

n 

(2.2) T Mt - 1 (X t )=Y, a k X t-k + et,n, t€Z, 

fc=i 

where (e^ n :t G Z) is stationary and ej >n G span{A% : s < t — n — 1}. 

Under the additional assumption that the coefficients (a^, k G N) are abso- 
lute summable, that is, YlkLi \ a k\ < oo, one then obtains an autoregressive, 
Wold-type representation of the underlying process given by 

oo 

(2.3) X t = Y,a k X t _ k + e t , t€Z. 

k=l 

Here again (e^ : t G Z) denotes a white noise, that is, uncorrelated, process 
with finite variance a 2 = Ee\ which fulfills 

oo 

(2.4) a 2 £ = lx (0)-Y,a k lx(k), 

k=l 

where 7x( - ) denotes the autocovariance function of X. 

Under the absolute summability assumption on the autoregressive coeffi- 
cients (afc) — conditions for which will be given in Lemma 2.1 in the sequel — 
we have that X t — VM t -i(Xt) = X t — Yl'k'=i a k x t-k', this implies that the 
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white noise process (ut) appearing in (2.1) coincides with the white noise 
process (e^) in (2.3). Notice that this does not mean that if we have an 
arbitrary one sided moving average representation of a time series (Xt), 
even with summable coefficients, that this moving average representation 
is the Wold representation of the process; see Remark 2.1 for an example. 
Furthermore, let fx be the spectral density of X, that is, 

/ x (A) = (2vr)- 1 ^ 7 (/ l )exp{-iA/ l }, Ag[-7t,7t]. 

h£Z 

Then, from (2.3) one immediately obtains that 



(2.5) 



fc=i 



2 2 

•/x(A) = ^, Ae[-7r,7r] 



which implies that for strictly positive spectral densities fx the power series 
A(z) := 1 — Ylk=i a k zk has no zeroes with \z\ = 1. For more details of the au- 
toregressive Wold representation (2.3) see Pourahmadi (2001), Lemma 6.4(b), 
(6.10) and (6.12). It is worth mentioning that in the historical evolution of 
Wold decompositions the autoregressive variant preceded the moving aver- 
age one. 

Remark 2.1. If we consider a purely nondeterministic and stationary 
time series possessing a standard one-sided moving average representation, 
and if we additionally assume that the spectral density is bounded away from 
zero and that the moving average coefficients bj are absolutely summable, 
then this would imply that the polynomial B{z) = 1 + Y^jLi bjZ 3 nas no ze ~ 
roes with magnitude equal to one. There may of course exist zeroes within 
the unit disk. But since the closed unit disk is compact and B(z) repre- 
sents a holomorphic function there could exist only finitely many zeroes 
with magnitude less than one. Following the technique described in Kreiss 
and Neuhaus [(2006), Section 7.13] one may switch to another moving av- 
erage model for which the polynomial has no zeroes within the unit disk. 
This procedure definitely changes the white noise process; for example, if 
the white noise process in the assumed moving average representation con- 
sists of independent random variables, this desirable feature typically is lost 
when switching to the moving average model with all zeroes within the unit 
disk removed. In fact, only the property of uncorrelatedness is preserved. 
The modified moving average process allows then for an autoregressive rep- 
resentation of infinite order and this process, because of the uniqueness of 
the autoregressive representation, coincides with the one in (2.3). 

The following simple example, taken from Brockwell and Davis [(1991), 
Example 3.5.2] illustrates these points. Based on i.i.d. random variables (e^) 
with mean zero and finite and nonvanishing variance a 2 , construct the simple 
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MA(l)-process 

(2.6) X t = e t -2e^ l , teZ. 

This MA(l)-model is not invertible to an autoregressive process. However, 
a general autoregressive representation as described above exists. In order to 
obtain this representation denote by L the usual lag-operator and consider 
B(L) -1-2L as well as B(L) := 1 - 0.5L. Of course 

(2.7) Xt = B{L)S^\e t . 

B(L) 



Since |B(e- iA )| 2 /|-B(e _iA )| 2 = 4, we obtain that 



Again (et) is a (uncorrelated) white noise process with variance <r 2 = 4cx 2 . 
Moreover, we have 

oo 

(2.8) X t = e t - 0.5e t _i = -^ToMXt-j + e t . 

3=1 

Obviously et = Xt — VMt-iiP^t) which means that (2.8) and not (2.6) is the 
Wold representation of the time series (Xt). This also means that the mod- 
ified moving average (or Wold) representation of the process (Xt) possesses 
only uncorrelated innovations (et) instead of independent innovations (et). 
But the representation (2.8) with uncorrelated innovations has the advantage 
that it indeed possesses an autoregressive representation of infinite order. 
Of course, via the described modification, we do not change any property of 
the process (Xt). But, and this is essential, the modification leading to the 
general AR(oo)-representation typically destroys a existing independence 
property of the white noise in a former moving average representation. 

To elaborate, the problem of understanding the stochastic properties of 
the innovation process in linear time series has been thoroughly investigated 
in the literature. Breidt and Davis (1992) showed that time reversibility 
of a linear process is equivalent to the fact that the i.i.d. innovations et are 
Gaussian and used this result to derive for a class of linear processes unique- 
ness of moving average representations with i.i.d. non-Gaussian innovations 
and to discuss the stochastic properties of the innovation process appearing 
in alternative moving average representations for the same process class. 
Breidt, Davis and Dunsmuir (1995) used such results to initialize autore- 
gressive processes in Monte Carlo generation of conditional sample paths 
running autoregressive processes backward in time and Andrews, Davis and 



8 



J.-P. KREISS, E. PAPARODITIS AND D. N. POLITIS 



Breidt (2007) for estimation problems for all-pass time series models. Prop- 
erties of the innovation process in non-Gaussian, noninvertible time series 
have been also discussed in Lii and Rosenblatt (1982, 1996). 

As we have seen the variances of et and e% do not coincide and the same 
is true for the fourth order cumulant E{ef)/a^ — 3 which will be of some 
importance later. Using the fact that e% is defined via a linear transformation 
on the i.i.d. sequence (et) we obtain by straightforward computation 

( j (^)) 2 5 at 5' 

which only equals E(ef)/a^ — 3 in case the latter quantity is equal to 0, for 
example, when the et are normally distributed. The normally distributed 
case always leads to the fact that uncorrelatedness and independence are 
equivalent, thus implying that the white noise process in the general au- 
toregressive representation always consists of independent and normally dis- 
tributed random variables which leads for the autoregressive sieve bootstrap 
in some cases to a considerable simplification as we will see later. 

In order to get conditions which ensure the absolute summability of the 
autoregressive coefficients (at,k S N), one can go back to an important paper 
by Baxter (1962). Informally speaking it is the smoothness of the spectral 
density fx which ensures summability of these coefficients. To be more pre- 
cise, we have the following result. 

Lemma 2.1. (i) If fx is strictly positive and continuous and if 

oo 

^/i r |7x(/i)| < oo 

h=0 

for some r > 0, then 

oo 

(2.10) ^h r \a h \ <oo. 

h=0 

(ii) If fx is strictly positive and possesses k>2 derivatives, then 

oo 

(2.11) J2h r \j x (h)\ <oo Vr<fc-1. 

h=0 

Proof. Cf. Baxter (1962), pages 140 and 142. □ 

The uniquely determined autoregressive coefficients (a^) are closely re- 
lated to the coefficients of an optimal (in the mean square sense) autore- 
gressive fit of order p, or equivalently, to prediction coefficients based on the 
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finite past. To be precise, denote the minimizers of 

2 

(2.12) E[X t 




by a>i(p), . . . ,a p (p), which of course are solutions of the following Yule- 
Walker linear equations: 

/ 7x(0) ••• 7x(p-l)\ /ci\ /7x(l)' 
P.13) : : ; : 

\7x(p-l) ••• 7x(0) / \Cp/ \7x(p). 

Recall from Brockwell and Davis [(1991), Proposition 5.1.1] that the co- 
variance matrix T(p) on the left-hand side is for all p invertible provided 
7x(0) > and jx (h) — > as ft— >■ oo. 

Now by slight modifications of Baxter (1962), Theorem 2.2 [cf. also Pourah- 
madi (2001), Theorem 7.22], we obtain the following helpful result relating 
the coefficients dk(p) of the pth order autoregressive fit to the {at) of the 
general autoregressive representation. 

Lemma 2.2. Assume that fx is strictly positive and continuous and 
that YlhLoi^ + h) T \jx(h)\ < oo for some r > 0. Then there exists p Q € N and 
C > (both depending on fx only) such that for all p>p , 

p oo 

(2.14) ]T(1 + k) r \a k (p) -a k \<C- £ (1 + fc) r W 

fc=0 fc=p+l 

as we// as 

oo 

(2.15) Yj( 1 + k Y\ a k\ <°°- 

fe=l 

T/iis means that we typically can achieve a polynomial rate of convergence 
of cikip) toward a k . 

As already mentioned, 7x(0) > and ^x{h) — > as h — > oo ensure non- 
singularity of all autocovariance matrices appearing in the left-hand side 
of (2.13). Since these matrices are positive semidefinite this means that under 
these conditions T(p) actually is positive definite. This in turn with Kreiss 
and Neuhaus [(2006), Section 8.7] implies that the polynomial A p (z) = 
1 — ]Cfe=i a k(p)z k has no zeroes in the closed unit disk. We can even prove 
a slightly stronger result. 

Lemma 2.3. Assume that fx is strictly positive and continuous, that 
Sftio |7x(^)| < 00 an d 7x(0) > 0. Then there exists 5 > and p„6N such 
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that for all p > p Q , 

>5>0. 



(2.16) inf 

|*|<i+i/p 



k=i 



The uniform convergence of A p (z) toward A(z) on the closed unit disk 
immediately implies the following corollary to Lemma 2.3. 

Corollary 2.1. Under the assumption of Lemma 2.3, we have 

oo 

(2.17) A(z) = 1 - ^TajZ^O W\z\ < 1. 

3=1 

Lemma 2.3 and Corollary 2.1 now enable us to invert the power series A{z) 
as well as the polynomial A p (z). Let us denote 

(oo \ ~ 1 oo 

1 - ajz j =1 + Y1 a i zj V M - 1 
3=1 / 3=1 

and for all p large enough (because of Lemma 2.3) 

-l 



(2.i9) (i-j2 aj ( P )zn =i+j2^(py vi*i<i+-. 

V 3=1 / 3=1 P 

From (2.19), one immediately obtains that 



(2.20) \ aj (p)\ <C- + Vj'GN. 

A further auxiliary result contains the transfer of the approximation prop- 
erty of aj (p) for cifc to the respective coefficients (p) and of the inverted 
series. For such a result, we make use of a weighted version of Wiener's 
lemma; cf. Grochenig (2007). 

Lemma 2.4. Under the assumptions of Lemma 2.3 and additionally 
YlhLoi^ + h) r \7x(h)\ < oo for some r > there exists a constant C > such 
that for all p large enough 

oo oo 

(2.21) ^(1 +j)> i (p)- aj] <c- (1 + ifKl^ooO. 

3=1 3=P+1 

In a final step of this section, we now move on to estimators of the co- 
efficients ak{p). The easiest one might think of is to replace in (2.13) the 
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theoretical autocovariance function by its sample version Jx(h). Denote 
the resulting Yule- Walker estimators of a k (p) by a k (p), k = 1, . . . ,p. These 
Yule-Walker estimators are under the typically satisfied assumption that 
7x(0) > uniquely determined and moreover fulfill (by the same arguments 
already used) the desired property that 

v 

(2.22) A p (z) = 1 - ^2a k (p)z k V|z| < 1. 

k=l 

Thus, we can also invert the polynomial A p (z) and we denote 

(v \ ~ i oo 

l-j>*(p)* fc =l + Ya k (p)z k , \z\<l. 
k=i ) fe=i 

We require that the estimators (a k (p) : k = 1, . . . , p) converge — even at 
a slow rate — to their theoretical counterparts, namely: 

(Al) p(n) 2 ■ Ei<fc< P (n) \^k{p{n)) - a k (p(n))\ = O p {1), where p(n) denotes 
a sequence of integers converging to infinity at a rate to be specified. 

Assumption (Al), for example, is met if a sufficient fast rate of conver- 
gence for the empirical autocovariances toward their theoretical counterparts 
can be guaranteed. The convergence property of a k (p) carries over to the 
corresponding coefficients a k (p) of the inverted polynomials [cf. (2.23)] as is 
specified in the following lemma. 

Lemma 2.5. Under the assumptions of Lemma 2.3 and (Al), we have 
uniformly in k £ N 

(2.24) \a k (p(n)) - a k (p(n))\ < (l + -^j ^Op(l). 

3. Validity of the AR-sieve bootstrap. 

3.1. Functions of generalized means. Consider a general class of estima- 
tors 

n— m+l \ 

— £ g(X u ...,X t+m - 1 )}, 

n — m + 1 ^-j' / 

discussed in Kiinsch (1989), cf. Example 2.2; here g : R m ->• R d and / : R d ->• R. 
For this class of statistics, Biihlmann (1997) proved validity of the AR-sieve 
bootstrap under the main assumption of an invertible linear process with 
i.i.d. innovations for the underlying process (Xt); this means a process which 
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admits an autoregressive representation (1.2). The class of statistics given 
in (3.1) is quite rich and contains, for example, versions of sample auto- 
covariances, autocorrelations, partial autocorrelations, Yule-Walker estima- 
tors and the standard sample mean as well. 

The necessary smoothness assumptions on the functions / and g are de- 
scribed below; these are identical to the ones imposed by Biihlmann (1997). 

(A2) f(y) has continuous partial derivatives for all y in a neighborhood 
of 9 = Eg(Xt, ■ ■ ■ ,Xt+ m -i) and the differentials YliLi df(y)/dxi\ x= g do not 
vanish. The function g has continuous partial derivatives of order h (h>l) 
that satisfy a Lipschitz condition. 

We intend to investigate in this section what the autoregressive sieve 
bootstrap really mimics if it is applied to statistics of the form (3.1) and the 
observations X±, . . . ,X n do not stem from a linear AR(oo) process (1.2). 
To be precise, we only assume that we observe Xi, . . . ,X n from a process 
satisfying the following assumption (A3). 

(A3) (Xt : t G Z) is a zero mean, strictly stationary and purely nondeter- 
ministic stochastic process. The autocovariance function satisfies 
Xyh=o h r \^fx{h)\ < oo for some r G N specified in the respective results and 
the spectral density fx is bounded and strictly positive. Furthermore, 
E(Xf) <oo. 

Notice that the processes class described by (A3) is large enough and in- 
cludes several of the commonly used linear and nonlinear time series models 
including stationary and invertible autoregressive moving-average (ARMA) 
processes, ARCH processes, GARCH processes and so on. Summability of 
the autocovariance function implies that the spectral density fx exists, is 
bounded and continuous. We also added in (A3) the assumption of finite 
fourth order moments of the time series. This assumption seems to be un- 
avoidable due to the autoregressive parameter estimation involved in Step 1 
of the AR-sieve bootstrap procedure and in regard of assumption (Al). 

From Section 2, we know that if (Xt) satisfies assumption (A3) then it 
possesses an autoregressive representation with an uncorrelated white noise 
process (st): cf. (2.3). Because of the strict stationarity of (Xt), we have that 
the time series (et) is strictly stationary as well and thus that the marginal 
distribution C(et) of et does not depend on t. 

Theorem 3.1 is the main result of this section. To state it, we define the 
companion autoregressive process X = (X t : t € Z) where X t is generated as 
follows: 

oo 

(3.2) X t = J2ajXt-j+e t , i G Z; 

3=1 
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here (et) consists of i.i.d. random variables whose marginal distribution of et 
is identical to that of et from (2.3), that is, C(st) = £(et). It is worth men- 
tioning that all second order properties of (Xt) and (Xt), like autocovariance 
function and spectral density, coincide while the probabilistic characteristics 
beyond second order quantities of both stationary processes are not neces- 
sarily the same. Now, let T n be the same statistic as T n defined in (3.1) but 
with Xt replaced by X t , that is, 

. n—m+l \ 

i~T 9{X t , Xt+ m -i) ■ 

m + 1 ^ J 

The main message of Theorem 3.1 is that the AR-sieve bootstrap ap- 
plied to data X\ , . . . , X n in order to approximate the distribution of the 
statistic (3.1) will generally lead to an asymptotically consistent estimation 

of the distribution of the statistic T n . This implies that for the class of 
statistics (3.1), the AR-sieve bootstrap will work if and only if the limiting 
distributions of T n and of T n are identical. 

Theorem 3.1. Assume assumptions (Al), (A2), (A3) for r = 1 and 

the moment condition Eef^ h+2 ^ [cf. (A2) for the definition of h and (2.3) 
for the definition of Et], the condition p(n) = o((n/ log re) 1 / 4 ) on the order of 
the approximating autoregression to the data and the following two further 
assumptions: 

( A4) The empirical distribution function F n of the random variables E\ , . . . , 
e n converges weakly to the distribution function F of C(e±). 

(A5) The empirical moments l/^X^i^t converge in probability to Ee\ 
for allr <2(h + 2). 

Then, 

(3.4) d K {C*{^i(T* - f(8*))),£(V^(T n - /(§)))) = o P (l) 

asn^oo. Here 0* = Eg(X^ A" t * +m _ 1 ) , 9 = Eg(X t , . . . , X t+m -i) and d K 
denotes the Kolmogorov distance. 

Some remarks are in order. 

Remark 3.1. (i) Assumption ( A4) does imply that we need some condi- 
tions on the dependence structure of the random variables Et- For instance, 
a standard mixing condition on (Et) suffices to ensure (A4): cf. Politis, Ro- 
mano and Wolf (1999), Theorem 2.1. 

(ii) Assumption (A5) on the empirical moments is fulfilled if we ensure 
that sufficiently high empirical moments of the underlying strictly station- 
ary time series Xt itself would converge in probability to their theoretical 
counterparts. 
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(iii) As we already pointed out, Theorem 3.1 states that the AR-sieve 
bootstrap mimics the behavior of the companion autoregressive process (Xt) 
as far as statistics of the form (3.1) are considered. Of course if (Xt) is a linear 
process with i.i.d. innovations and if the corresponding moving average rep- 
resentation is invertible leading to an infinite order autoregression (1.2) with 
i.i.d. innovations, then the AR-sieve bootstrap works asymptotically as is al- 
ready known. Moreover, for general process satisfying assumption (A3), if 
we are in the advantageous situation that the existing dependence structure 
of the innovations (et) appearing in the general AR(oo) representation (2.3) 
does not show up in the limiting distribution of T n , then Theorem 3.1 im- 
plies that the AR sieve bootstrap works. We will illustrate this point by 
several examples later on. 

Now we discuss relevant specializations of Theorem 3.1. Notice that the 
advantage of this theorem is that in order to check validity of the AR-sieve 
bootstrap, one only needs to check whether the asymptotic distributions of 
T n = T n (Xi, . . . ,X n ) based on the observed time series is identical to the 
distribution of the statistic T n = T n (X±, . . . ,X n ) based on fictitious obser- 
vations Xx,X2, ■ ■ ■ , X n from the companion process X. If and only if this is 
the case, then the AR-sieve bootstrap works asymptotically. 

Example 3.1 (Sample mean). Consider the case of the sample mean 
T n = X n = n~ l Y2t=l and recall that under standard and mild regularity 
conditions (e.g., mixing or weak dependence) we typically obtain that the 
sample mean of stationary time series satisfies y/nT n => N(0, YlhL~oo lx(h)) 
as n — > oo where =>• stands for weak convergence. Thus, the asymptotic 
distribution of the sample mean depends only on the second order properties 
of the underlying process X and since the companion process X has the 
same second order properties as X we immediately get by Theorem 3.1 
that the AR-sieve bootstrap asymptotically works in the case of the mean 
for general stationary time series for which the spectral density is strictly 
positive. Even the strict stationarity is not necessary in this case. This is 
a novel and significant extension of the results of Biihlmann (1997). 

Example 3.2 (Sample autocovariances). For < h < n, let T n = "f(h) = 
n 1 Y17=i(Xt — X n )(X t +h — X n ) be the sample autocovariance at lag h. Let 
us assume that X^,/i 2 ,h 3 =-oo \cum(X t , X t+hl , X t+h2 , X t+hs )\ < oo holds and 
denote by 

oo 

/ 4 (Ai,A 2 ,A 3 )= Yl ™m(X t ,X t+hl ,X t+h2 ,X t+h:i y^r^rh r 

hi,h2,h,3=—oo 
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the fourth order cumulant spectrum of X. Under standard and mild reg- 
ularity conditions [see, for instance, Dahlhaus (1985), Theorem 2.1 and 
Taniguchi and Kakizawa (2000), Chapter 6.1], it is known that y/n(T n — 
7 (/i)) ^ jV(0,r 2 ) where 



/7V 
4cos(A/i) 2 / 2 (A)dA 
-TT 



+ / / 4cos(Ai/i)cos(A 2 /i)/ 4 (-Ai,A 2 ,-A 2 )G!AidA 2 . 



Notice that in contrast to the case of the sample mean, the limiting distribu- 
tion of the sample autocovariance depends also on the fourth order moment 
structure of the underlying process X. Now, to check validity of the AR- 
sieve bootstrap we have to derive the asymptotic distribution of the sample 
autocovariances for the companion autoregressive process (Xt). This can be 
easily done, since the autoregressive polynomial in the general autoregres- 
sive representation (2.3) is always invertible (cf. Corollary 2.1) from which 
we immediately get a one-sided moving average representation with i.i.d. 
innovations (ei) for the companion process (Xt). Furthermore, the fourth 
order cumulant spectrum of (Xt) is given by 

/ 4 (Ai, A 2 , A 3 ) = (2^)- 3 (jjjpy ~ 3) a(Ai)a(A 2 )a(A 3 )a(-Ai - A 2 - A3), 

where a(A) = Yl'jLo a j ex P{ — ij A} and the coefficients (ctk) are those appear- 
ing in (2.18); see Section 2. Thus, for the sample autocovariance T n we get 

from Brockwell and Davis [(1991), Proposition 7.3.1], that y / n(T n —j(h)) =>■ 
iV(0,? 2 ) where 

4cos(A/i) 2 / 2 (A)rfA 
+ J J 4cos(Ai/i) cos(A 2 /i)/ 4 (-Ai, A 2 , -A 2 ) dAi d\ 2 



(3.5) 



Ee\ 
(Ee\? 



3 )(l(h)Y 



+ E (l(k) 2 +l(k + h) 7 (k-h)). 



k=—oo 



Since the variances r 2 and r 2 of the asymptotic distributions of T n and T n 
do not coincide in general, we conclude by Theorem 3.1 that the AR-sieve 
bootstrap fails for sample autocovariances. Notice that this failure is due to 
the fact that in general the limiting distribution of sample autocovariances 
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depends additionally on the fourth order moment structure of the under- 
lying process X, and this structure may substantially differ from that of the 
companion process X. 

Interestingly enough, even if the underlying process X is a linear time 
series, that is, satisfies (1.1), the AR-sieve bootstrap may fail for the sample 
autocovariances. To see why, note that from the aforementioned proposition 
of Brockwell and Davis (1991), the asymptotic distribution of T n satisfies 
\/n(T n — j(h)) =>- iV(0,r|) where r£ is given by 

(3-6) T '=(7^- 3 b 2 W+ E (l\k) +1 (k + hMk-h)). 

k=— oo 

Special attention is now due to the factor of the first summand of (3.6), which 
is the fourth order cumulant of the i.i.d. process (et). Recall the asymptotic 
distribution of the sample autocovariances for the companion autoregressive 
process (Xt) and especially its variance given in (3.5). The two asymptotic 
variances given in (3.6) and (3.5) are in general not the same since the fourth 
order cumulant of the two innovation processes (et) and (et) are not nec- 
essarily the same. We refer to Remark 2.1 for an example. Of course, all 
appearing autocovariances are identical since we do not change the second 
order properties of the process when switching from (Xt) to (Xt). The conse- 
quence is that for sample autocovariances, the AR-sieve bootstrap generally 
does not work even for linear processes of the type (1.1) and this is true 
even if the process is causal; see Remark 2.1 and example (2.6). 

Remark 3.2. If the innovations {et} in (1.1) are not necessarily i.i.d. 
but form a martingale difference sequence, then we are in the above described 
situation more general than in the case of (1.1) with i.i.d. innovations and 
thus the limiting distribution of sample autocovariances and also of statistics 
of the type (3.1), is not correctly mimicked by the AR-sieve bootstrap. This 
contradicts Theorem 2 of Poskitt (2008). 

We conclude this example by mentioning that in the case where the i.i.d. 
innovations et in (1.1) are normally distributed, it follows that the random 
variables et are Gaussian as well, and that in both expressions (3.6) and (3.5) 
the fourth order cumulants appearing as factors in the first summands van- 
ish. This means that for the special case of Gaussian time series fulfilling 
assumption (1.1) the AR-sieve bootstrap works. 

Example 3.3 (Sample autocorrelations). Consider the estimator T n = 
p(h) = j(h)/~/(0) of the autocorrelation px(h) = Jx (h) /jx (0) . Due to the 
fact that for general processes satisfying assumption (A3) the limiting dis- 
tribution of T n depends also on the fourth order moment structure of the 
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underlying process X, cf. Theorem 3.1 of Romano and Thombs (1996), which 
is not mimicked correctly by the companion process X, the AR-sieve boot- 
strap fails. Fortunately the situation for autocorrelations is much better if 
we switch to linear processes of type (1.1). From Theorem 7.2.1 of Brock- 
well and Davis (1991), we obtain that y/n(T n — px(h)) => N(0,v 2 ) where the 
asymptotic variance is given by Bartlett's formula: 

v 2 = Y,{( l + 2 Px(h))p\{k) + px(k-h)px{k + h)-Ap x {h)p x (k) Px (k + h)}. 

fcez 

As can be seen, in this case the asymptotic variance depends only on the 
autocorrelation function px(-) [or equivalently on the standardized spectral 
density 7x 1 (0)/x( - )] °f the underlying process X. This means that the first 
summand in (3.6) which refers to the fourth order cumulant of the i.i.d. 
innovation process et in (1.1) does not show up in the limiting distribu- 
tion of sample autocorrelations and this in turn leads to the fact that the 
asymptotic distribution of sample autocorrelations based on observations 
stemming from the process X is identical to that of the sample autocorre- 
lation based on observations stemming from the companion autoregressive 
process X. This is true since the companion autoregressive process X shares 
all second order properties of the process X. Hence, the AR-sieve bootstrap 
works for the autocorrelations given data from a linear process (1.1). We 
stress here the fact that this result is true regardless whether the repre- 
sentation (1.1) allows for an autoregressive inversion or not and holds even 
though the probabilistic properties of the underlying process (Xt) and of 
the autoregressive companion process (Xt) beyond second order properties 
are not the same. 

Remark 3.3. Similar to the autocorrelation case, the validity of the AR- 
sieve bootstrap in the linear class (1.1) is shared by many different statistics 
whose large-sample distribution depends only on the (first and) second order 
moment structure of the underlying process. Examples include the partial 
autocorrelations or Yule- Walker estimators of autoregressive coefficients. 

Remark 3.4. Consider statistics of the type (3.1) in the easiest case 
where d=l, that is, 

n— m+1 

(3.7) — ]T g(X u ...,X t+m _ l ) 

for a function g : W 71 — > R. Here, the practitioner may approach this as the 
sample mean of observations Y±, . . . , Y n _ m+ i where (Y t = g(X t , . . . , X t + m -i) ■ 
t £ Z) . Notice that strict stationarity as well as mixing properties easily car- 
ries over from (Xt) to (Yt). Thus, we may apply the AR-sieve bootstrap to 
the sample mean of Yt, which works under quite general assumptions; cf. 
Remark 3.1 and Theorem 3.1. The only crucial assumption for establishing 
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asymptotic consistency of the AR-sieve bootstrap is that we need the prop- 
erty that the spectral density fy of the transformed time series (It) is strictly 
positive and continuous. Although this is not a very restrictive condition, it 
may be difficult to check since there seems to be no general result describing 
the behavior of spectral densities under nonlinear transformations. 



3.2. Integrated periodograms. The considerations of Examples 3.2 and 3.3 
can be transferred to integrated periodogram estimators for these quanti- 
ties which lead us to the second large class of statistics that we will discuss. 
Denote, based on observations X±, . . . , X n , the periodogram I n (A) defined by 

2 



(3.8) 



t=l 



A g [0,7r] 



and consider a general class of integrated periodogram estimators defined by 



(3.9) 



M(I n ,<p) 



<p(\)I n (\)d\, 



where (p denotes an appropriately defined function on [0,7r]. Under the main 
assumption that the underlying process X has the representation (1.1), 
Dahlhaus (1985) investigated asymptotic properties of (3.9) and obtained 
the asymptotic distribution of y / n(A/(/ n , y?) — M(fx,(f))- In particular, it 
has been shown that y / n(M(/ n , <p) — M(fx,tp)) converges, as n — > oo, to 
a Gaussian distribution with zero mean and variance given by 

(3.10) {Ee\/{Eelf - 3) QT <p(\)fxW d\) + 2tt £ ^(\)f 2 x (\) dX. 

Notice that substituting y?(A) by 2cos(A/i),/i G No, implies that M(I n ,(p) 
equals the sample autocovariance of the observations X\ , . . . , X n at lag h 
and (3.10) would then exactly turn to be the asymptotic variance given 
in (3.6). 

As we will see in Theorem 3.2, the situation for integrated periodo- 
grams (3.9) is rather similar to that of empirical autocovariances which are 
of course special cases of integrated periodograms. Thus, we only discuss 
briefly this rather relevant class of statistics. As Theorem 3.2 shows, we ob- 
tain for this class that the AR-sieve bootstrap asymptotically mimics the 
behavior of y/n{M(I n ,ip) — M(fx,ip)), where I n is defined as /„ with X t 
replaced by the companion autoregressive time series Xt, that is, 

2 



(3.11) 



iXt 



t=l 



A G [0, 7r]. 
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Theorem 3.2. Assume (Al) and (A3) with r = 1 and assume that for 
allM£N 

(3.12) £*(v^(7x* (h) - E*tf X * {h) ) : h = 0, 1, . . . ,M) N(0, V M ) 

(in probability), where Jx*(h) denote the empirical autocovariances of the 
bootstrap observations X*,X^, • • • , X* and where 

V M 

(3.13) 



((^- 3 )^)7(i) 



i,j=0 



M 

+ Yl (7(fc)7(fc-*+i)+7(* + i)7(*-*)) 

k=— oo 

TTien we obtain for all ip bounded and with bounded variation that (in prob- 
ability) 

d K (£*(MM(i:,<f) - M(/ AR , p))), 

(3-14) 

£(v^(M(J nj - M(f x ,<p)))) 0. 

#ere / AR (A) = - £j£" i % (pW^'V, A e [0,vr] and a(p(n)) 2 = 

Eei(p(n)) 2 , cf. Step 2 in the definition of the AR-sieve bootstrap procedure. 

Moreover, the limiting Gaussian distribution of ^/n(M(I*,^p) — M(f^,(p) 
possesses the following variance: 

2 



(3.15) (Eef /(Eel) 2 - 3) 



ip(X)f x (X)d\) +2vr/ ^ 2 (A)/£(A)dA. 



Remark 3.5. (i) Assumptions under which (3.12) is fulfilled are given 
in Theorem 3.1 since sample autocovariances belong to the class (3.1). 

(ii) Theorem 3.2 implies that if J Q ip(X)f(X)dX = then the AR-sieve 
bootstrap asymptotically works for the integrated periodogram statistics 
M(I n ,(p) for time series fulfilling (1.1). This follows immediately by a com- 
parison of the asymptotic variances (3.10) and (3.15). Clearly, the same re- 
sult holds true if the underlying time series is normally distributed since in 
this case both innovation processes, (et) and (et), are Gaussian and therefore 
the fourth order cumulants vanish. In all other cases the AR-sieve bootstrap 
does not work in general, since the fourth order cumulant Ee\j '(Eef) 2 — 3 
does not necessarily coincide with Eef/(Ee 2 ) 2 — 3; see Example 3.2. 

Relevant statistics for which we can take advantage of the condition 
J" <p(X) x /(A) dX = are the so-called ratio statistics which are defined 
by 

(3-16) R{I n , <P) = rir T 7TT • 

Jo InWdX 
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For this class of statistics, Dahlhaus and Janas (1996) showed that under 
the same assumptions of a linear process of type (1.1) one obtains that 
y/n(R(I n ,(p) — R(fx,<p)) has a Gaussian limiting distribution with mean 
zero and variance given by 

, (Jfx(X)dX)* 
(3.17) 

where ^(A) = <p(\) J f x {X)d\- J <p(X)f x (X)dX. 

Thus, exactly as in the case of sample autocorrelations the fourth order 
cumulant term [cf. (3.10)] of the i.i.d. innovation process disappears and 
therefore again the following corollary to Theorem 3.2 is true. This corollary 
states that the AR-sieve bootstrap works for ratio statistics under the quite 
general assumption that the underlying process is a linear time series (1.1) 
with i.i.d. innovations and a strictly positive spectral density which under 
model (1.1) is always continuous. 

Corollary 3.1. Under the assumptions of Theorem 3.2, we have that 
(in probability) 

(3.18) d K {C(yfa{R{&, <P) ~ R(f AR, <p))), C(V^(R(In, <p) ~ R(f, <p)))) -> 0. 

Moreover, the limiting Gaussian distribution of y/n(R(I*,ip) — R(f ar, ¥>)) 
possesses the variance given in (3.17). 

Another class of integrated periodogram estimators is that of nonparamet- 
ric estimators of the spectral density fx which are obtained from (3.9) if we 
allow for the function ip to depend on n. In particular, let </? n (A) = Kh{to — A) 
for some u) £ [0,ir] where h = h{n) is a sequence of positive numbers (band- 
widths) approaching zero as n — > oo, Kh(-) = h~ 1 K(-/h) and K is a kernel 
function satisfying the following assumption: 

(A6) K is a nonnegative kernel function with compact support [— tt,tt]. 
The Fourier transform k of K is assumed to be a symmetric, continuous and 
bounded function satisfying k(0) = 2ir and k 2 (u) du < oo. 



Denote by f nX be the resulting integrated periodogram estimator, that 



is. 



(3.19) /n,xM= / K h (uj - X)I n (X) dX. 

Notice that the asymptotic properties of the estimator (3.19) of the spectral 
density are identical to those of its discretized version f nX {oS) = 
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(n/i) _1 Y2j Kh{u — Aj)J n (Aj), where Xj = 2trj jn are the Fourier frequencies, 
as well as of so-called lag-window estimators; cf. Priestley (1981). 

Now, let /* x be the same estimator as (3.19) based on the AR-sieve boot- 
strap periodogram I* t (A) = (27rn)~ 1 | Y^t=i -^t exp{iAt}| 2 . We then obtain the 
following theorem. 

Theorem 3.3. Under the assumptions of Theorem 3.2 with r = 2 and 
assumption (A6), we have that (in probability) 



(3.20) d K (C*(Vnh(f* x (A) - /ar(A))), C(Vnh(f n , x (\) - f x (X)))) -> 0. 
Moreover, conditionally on Xi,X%, ■ ■ ■ , X n , 

^E*{r nX {x)-j AK {x)) 

(3.21) 

{0, ifn-Wh-tO, 

^fxW J u 2 K{u)du, tfn-^h^l, 

where f x denotes the second derivative of fx end 

(3.22) n/iVar(/; x (A))^(l + 5o, 7 r)/i(A)(2^)^y K 2 (u)du, 

where 5o, n = 1 if X = or tt and 5q i7T = otherwise. 

Recall that under Assumption (A3) it has been shown under different 
regularity conditions [see Shao and Wu (2007)] that Vnh(f n ,x(X) — fx{X)) 
converges to a Gaussian distribution with mean and variance given by the 
expression on the right-hand side of (3.21) and (3.22), respectively. Thus, 
the above theorem implies that for spectral density estimators like (3.19), 
the AR-sieve bootstrap asymptotically is valid for a very broad class of 
stationary time series that goes far beyond the linear processes class (1.1). 

Corollary 3.1 and Theorem 3.3 highlight an interesting relation between 
frequency domain bootstrap procedures, like for instance those proposed by 
Franke and Hardle (1992) and Dahlhaus and Janas (1996) and the AR-sieve 
bootstrap. Notice that the basic assumptions imposed on the underlying 
process X for such a frequency domain bootstrap procedure to be valid are 
that the underlying process satisfies (1.1) with a strictly positive spectral 
density fx- Furthermore, validity of such a frequency domain procedure has 
been established only for those statistics for which their limiting distribution 
does not depend on the fourth order moment structure of the innovation pro- 
cess et in (1.1). Thus, such a frequency domain bootstrap essentially works 
for statistics like ratio statistics or nonparametric estimators of the spectral 
density like (3.19). The results of this section, that is, Theorem 3.2, Corol- 
lary 3.1 and Theorem 3.3, imply that if the underlying stationary process 



22 



J.-P. KREISS, E. PAPARODITIS AND D. N. POLITIS 



satisfies (1.1) and if the spectral density is strictly positive then the AR-sieve 
bootstrap works in the same cases in which the frequency domain bootstrap 
procedures work. 

4. Conclusions. In this paper, we have investigated the range of validity 
of the AR-sieve bootstrap. Based on a quite general Wold-type autoregres- 
sive representation, we provided a simple and effective tool for verifying 
whether or not the AR-sieve bootstrap asymptotically works. The central 
question is to what extent the complex dependence structure of the under- 
lying stochastic process shows up in the (asymptotic) distribution of the 
relevant statistical quantities. If the asymptotic behavior of the statistic of 
interest based on our data series is identical to that of the same statistic 
based on data generated from the companion autoregressive process, then 
the AR-sieve bootstrap leads to asymptotically correct results. 

The family of estimators that have been considered ranges from simple 
arithmetic means and sample autocorrelations to quite general sample means 
of functions of the observations as well as spectral density estimators and 
integrated periodograms. Our concrete findings concerning validity of the 
AR-sieve bootstrap are different for different statistics. Generally speaking, 
if the asymptotic distribution of a relevant statistic is determined solely 
by the first and second order moment structure, then the AR-sieve boot- 
strap is expected to work. Thus, validity of the AR-sieve bootstrap does 
not require that the underlying stationary process obeys a linear AR(oo) 
representation (with i.i.d. or martingale difference errors) as was previously 
thought. Indeed, for many statistics of interest, the range of the validity of 
the AR-sieve bootstrap goes far beyond this subclass of linear processes. In 
contrast, we point out the possibility that the AR-sieve bootstrap may fail 
even though the data series is linear; a prominent example is the sample 
autocovariance in the case of the data arising from a noncausal AR(p) or 
a noninvertible MA(g) model. 

Finally, our results bear out an interesting analogy between frequency 
domain bootstrap methods and the AR-sieve method. In the past, both 
of these methodologies have been thought to work only in the linear time 
series setting. Nevertheless, we have just shown the validity of the AR-sieve 
bootstrap for many statistics of interest without the assumption of linearity, 
for example, under the general assumption (A3) and some extra conditions. 
In recent literature, some examples have been found where the frequency 
domain bootstrap also works without the assumption of a linear process; see, 
for example, the case of spectral density estimators studied by Shao and Wu 
(2007). By analogy to the AR-sieve results of the paper, it can be conjectured 
that frequency domain bootstrap methods might also be valid without the 
linearity assumption as long as the statistic in question has a large-sample 
distribution depending only on first and second order moment properties; 
cf. Kirch and Politis (2011) for some results in that direction. 
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APPENDIX: AUXILIARY RESULTS AND PROOFS 



Proof OF Lemma 2.2. From Baxter [(1962), Theorem 2.2] in a slightly 
more general version given in Baxter [(1963), Theorem 1.1], we obtain for ar- 
bitrary submultiplicative weight or norm functions v(k) > 1, that is, v{n) < 
v{m) ■ v{n — m) for all n, m, that the following bound holds true for all p G N 
and a constant C > 



(A.l) 



k=0 



Ofc(p) 



a\p) 



rr2 



<c- ^2 "(*) 

k=p+l 



a 2 



Here, a 2 (p) = £(X t - £Li Ofe(p)^-fc) 2 < ^ 2 (0) for all p G N and a 2 (p) -> 
cr 2 [cf. (2.4)] as p—?-oo. 

Since z^(/c) = (1 + A;) 7 " is submultiplicative for all r > 0, cf. Grochenig 
[(2007), Lemma 2.1], we obtain from (A.l) 



(A.2) 



J2(l + k) r \a k (p) 



a k \ 



k=0 



<£(l+fc)' 

k=0 

V 



ctk(p) a k 



a 2 {p) 



■a\p) 



+ ^{l + k) r \a k \ 



k=0 
,2 1 



a 2 (p) 



< 



Ca 2 (0) 



ak\ 



k=0 



a 2 (p) 



(i+fc) r |a fc |, 

k=p+l 



which is the assertion of Lemma 2.2. To see the last bound, observe that 
because of ao(p) = ao = 1 we can bound | -jrn?, — -jz I by the right-hand side 
of (A.l) as well. □ 



Proof of Lemma 2.3. As mentioned just in front of the statement of 
Lemma 2.3, we have A p (z) = 1 — a k (p)z k for all \z\ < 1. Now assume 

that (2.16) is false. Then there exists a sequence {p(k) : k G N} C N, p(k) — > 
oo and a sequence {z k ■ k G N} of complex numbers with \z k \ < 1 + l/p(k) 
such that 

(A. 3) A p ^(zk) — >k-+oo 0. 

Let us further assume that we can find a subsequence of {z k : k G N} which 
completely stays within the closed unit disk. Without loss of generality, as- 
sume that {zk} itself has this property. Since we have A p (z) ^ 0, V|z| < 1 and 
because A p is holomorphic, the minimum principle of holomorphic functions 
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leads to 

(A.4) \A p {z)\ > min \A p (z)\ V|z| < 1. 

\z\=l 

The set {z £ C\\z\ = 1} is compact and \A p \ is continuous, thus there exists 
a z* with |z*| = 1 and |A p (z*)| = mhi| z | =1 |.Ap(,z)| . 

Without loss of generality, assume that z* converges to a complex num- 
ber z* with \z*\ = 1. From the above, we have 

( A -5) \ A p(k)( Z p(k))\ ^ \ A p(k)(Zk)\ ^fc^oo 0. 

Writing 

(A.6) A(z*) = A(z*) - A(z* m ) + A(z* p{k) ) - A p{k) (z* p{k) ) + A p{k) { z ; {k) ) 

and having in mind that A p (z) converges to A{z) uniformly on the closed 
unit disk because of Lemma 2.2 and regarding (A. 5) as well as the continuity 
of A{z) we finally obtain A(z*) = which is a contradiction to A(z) ^ for 
all \z\ = 1 [cf. below (2.5)]. 

Since we cannot find a subsequence of z k , completely staying in the unit 
disk it exists a subsequence (z k >) that completely stays in the region 1 < 
\z\ < 1 + l/p(k'). Again assume without loss of generality that k' = k and 
that z k converges to some z Q which necessarily must fulfill \z a \ = 1. 

We will show that A(z Q ) = holds, which again is a contradiction to 
A(z) ^ for all \z\ = 1. To this end, let us write A{z ) in the following way: 

P (k) 

A(z ) = A p(k) (z k ) + J2(aj(p(k)) - aj)z J k 
i=i 

(A.7) 

p{k) oo 

i=i j=p(k)+i 

The first summand on the right-hand side converges to zero by (A. 3) and 
the last summand is bounded through Yl < jL p (k)+l \ a j \ — ^ as A; — > oo. The 
second summand in turn is bounded by 

p 

Y\{aj{p) -aj)z 3 

J=l — i I*I<i+Vp 
For the third and last summand, which reads 

oo 

(a.9) J2 a M-4)W<p(m, 

one obtains by dominated convergence [recall that \z k \ 3 for j < p{k) is 
bounded by 3 and that z k — > z Q ] also convergence to zero. 
This concludes the proof of Lemma 2.3. □ 



(A.8) sup 

M<i+i/p 



p 

< \a>j(p) — cbj\ sup \z\ p -> p ^oo 0. 
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Proof of Lemma 2.4. Under the assumptions the autoregressive co- 
efficients, a k have the following property: 



V 



a:= (l 1 -a 1 ,-a 2 ,...) 6 £\ 

(A.10) 



oo 



:= |(^:jGN )cC|^(l+i) r |^| <oo|. 

Because of 1 — Yl"jLi a j z:! V\ z \ — 1 ( c f- Corollary 2.1) we have from 
Grochenig [(2007), Theorem 6.2] that a multiplicative inverse a -1 £ £y ex- 
ists. For this result, observe that our weight function v(k) = (1 + /c) r satisfies 
the so-called Gelfand-Raikov-Shilov (GRS) condition 

(A.ll) u{nk) l/n — >■ 1 as ra — t- oo; 

cf. Grochenig (2007), Lemma 2.1. 

Since multiplication here is the usual convolution of sequences, we have 
that a" 1 = (1, ai, U2, ■ ■ •), where the coefficients a k coincide with the power 
series coefficients of (1 — J2T=i a kZ k )~ l ■ The assertion a -1 S l\ then just 
means that we have for the coefficients a k in (2.19) 

oo 

(A.12) ^(l+i)>,|<oo. 

j=0 

Exactly along the same lines, we obtain [cf. (2.19)] 

a(p) _1 = (l,-oi(p),-a2(p),---,-a P (p),0,...) _1 

(A.13) 

= (l,a 1 (p),a 2 (p),...)er i . 

We have 

oo 
k=0 

= la(p)" 1 -a _1 |^ 

= |a(p)^ 1 (a-a(p))a~ 1 |^ 

= |a(p) _1 - a _1 |^|a- a(p)|^ + |a _1 |^ + |a _1 |^|a - a(p)|^. 
Simple algebra finally leads to 

|a~ 1 |f t ,|a-a(p)|^ 



5^(1 + k) r \a k (p) - a k \ = |a(p) 1 - a^ 1 ^ < - 



fc=0 



a(p) 



Recall that |a-a(p)|^ = EjLiC 1 + k) r \a k - a k (p)\ +T,kL P +i( l + k Y\ a k\ m 
order to obtain from Lemma 2.2 the desired assertion. □ 
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Proof of Lemma 2.5. For simplicity, we write p instead of p(n). From 
Lemma 2.3, we have that the polynomial A p (z) has no zeroes with magni- 
tude less than or equal to 1 + 1/p. Since we easily get from assumption (B) 
convergence of A p (z) = 1 — Y^k=\^k{p)z k to A p {z) uniformly on the closed 

disk with radius 1 + l/p(n) the polynomial A p {z) does not possess zeroes 
with magnitude less than or equal to 1 + l/p(n). Therefore, Cauchy's in- 
equality for holomorphic functions applies and yields 

\a k (p) - a k (p)\ < max |i4p(«) _1 - A p {z)~ l \ 

1 \A p (z) - A p (z)\ 

max 



(1 + l/p) k |*|=i+i/ P \A p {z)A p {z)\ 

< -, ; — tt max — - 

" (1 + l/p) k \z\=i+i/ P \A p (z)A p (z)\ 

l\~ fc 1 

Proof of Theorem 3.1. A careful inspection of the proof of Theo- 
rem 3.3 in Biihlmann (1997) [see also the corresponding technical report 
Buhlmann (1995)] shows that only the following properties of the underly- 
ing, the companion and the fitted autoregressive process really are needed: 

(i) ej — > St in probability, 

(ii) {X* tl ,...,X* d )^(X tl ,...,X td ) in probability, 
( ni ) Sj^o \®j(p( n )) ~ a j \ -^n^oo in probability, 

Y^=oj\^j{p{ n ))\ ls uniformly bounded in probability, 

(v) Y.T=oi\<xj\ <°° ) 

(vi) the empirical moments of e^(p(n)) converge for orders up to 2{h + 2) 
to the moments of S\ , 

(vii) the autoregressive representation of infinite order of the process (Xt) 
is invertible, 

(viii) Yule- Walker parameter estimators are used for the autoregressive 
fit of order p(n) to the data X±, . . . , X n . 

Because of (A4) and (A5) and the easily obtained fact that for the Mallows 
metric cfo 

(A.14) <k{F n , F n ) ->• in probability, 

where F n denotes the empirical distribution function of the centered resid- 
uals £tip{n)), t = p(n) + !,...,« of the autoregressive fit and F n denotes 
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the empirical distribution function of fictitious observations et(p(n)),t = 
p(n) + 1, ... , n, we obtain (i). 

(ii) is obtained exactly along the lines as in Corollary 5.6 of Biihlmann 
(1997). 

To see (hi), recall from Section 2 that we have \aj{p(n)) —ctj(p(n)) \ = (1 + 
l/p(n))~-> • l/p 2 Op{\) as well as ~ a j\ < c ■ Y,f= P (n)j\ a j\ 

which converges to as n goes to infinity. These two assertions ensure (iii). 

(iv) is obtained exactly along these lines by using the fact that Yl'jLi j\ a j I — 
oo, cf. (2.29), which also is (v). 

Furthermore, it is easy to see that the difference of the empirical moments 
(up to the necessary order) of %{p(n)) and of e± converge to zero due to the 
bounds for cij(p) — ctj(p), ctj(p) — ctj and ctj, cf. (2.24), (2.21) and (A. 12). 
Together with (A5), we obtain (vi). 

Finally, we use for the autoregressive fit Yule- Walker parameter estima- 
tors and the autoregressive representation of (Xt) is invertible (cf. Section 2). 
This concludes the proof of Theorem 3.1. □ 

Proof of Theorem 3.2. Also due to the results of Dahlhaus (1985), 
we obtain that the distribution of ^/n(M(I n ,<p) — M(fx,<p)), where I n de- 
notes the jperiodogram of n observations of the autoregressive companion 
process (Xt), cf. (3.2), asymptotically is normal with mean zero and vari- 
ance 

(A.15) (Eet/(E(e t ) 2 ) 2 -3)(J\(X)f x (\)dx) + 2tt J\ 2 (X)f x (X)dX. 

Thus, it suffice^ to show that the distribution of the bootstrap approxima- 
tion of y/n(M(I*,ip) — M(/ar, <p)) shares the same asymptotic distribution. 
Exactly along the lines of proof of Theorem 4.1 in Kreiss and Paparoditis 
(2003) (without the additional nonparametric correction considered therein) 
which makes use of Proposition 6.3.9 of Brockwell and Davis (1991), we ob- 
tain the the desired result. □ 

PROOF of Theorem 3.3. Let f n (X) = K h (co - X)I n (X) d\ and con- 
sider Y n = Vnh(f n (X) — /x(A)). Since Y n converges to a Gaussian distribu- 
tion with mean and variance as in (3.21) and (3.22), respectively, it suffices 
to show that Vnh(f*(X) — /ar(A)) shares exactly the same asymptotic be- 
havior as Y n . This however, follows exactly along the same lines as in the 
proof of Theorem 5.1 in Kreiss and Paparoditis (2003), again without the 
additional nonparametric correction considered therein. □ 
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